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Abstract 

We show that any quantum density matrix can be represented by a Bayesian network 
(a directed acyclic graph), and also by a Markov network (an undirected graph). We 
show that any Bayesian or Markov net that represents a density matrix, is logically 
equivalent to a set of conditional independencies (symmetries) satisfied by the density 
matrix. We show that the d-separation theorems of classical Bayesian and Markov 
networks generalize in a simple and natural way to quantum physics. The quantum 
d-separation theorems are shown to be closely connected to quantum entanglement. 
We show that the graphical rules for d-separation can be used to detect pairs of nodes 
(or of node sets) in a graph that are unentangled. CMI entanglement (a.k.a. squashed 
entanglement), a measure of entanglement originally discovered by analyzing Bayesian 
networks, is an important part of the theory of this paper. 
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1 Introduction 



A Bayesian network is a directed graph; that is, a set of nodes with arrows connecting 
some pairs of these nodes. Each node is assigned a transition matrix. For a classical 
Bayesian net, each transition matrix is real, and the product the transition matrices 
for all the nodes gives a joint probability distribution for the states of all the nodes. 
For a quantum Bayesian net, each transition matrix is complex, and the product of 
the transition matrices gives a joint probability amplitude instead. 

A Markov network is an undirected graph; that is, a set of nodes with undi- 
rected links connecting some pairs of these nodes. Each super-clique (maximal fully- 
connected subgraph) of the graph is assigned an affinity. For a classical Markov net, 
each affinity is real, and their product gives a joint probability distribution for the 
states of all the nodes. For a quantum Markov net, each affinity is complex, and their 
product gives a joint probability amplitude instead. 

Bayesian and Markov networks will be defined more precisely later on in this 

paper. 

The literature on classical Bayesian nets is vast. Some textbooks that were 
invaluable in writing this paper are Refs.[l],[2]. Classical Bayesian nets were invented 
by geneticist Sewall Wright [3] in the early 1930's. The theory of Bayesian nets was 
extended substantially by Judea Pearl [4|[5J|6J and collaborators in the late 1980's. 
They gave us the theory that culminates in the d-separation rules. See Schemes [7] for 
a more complete review of the history of d-separation. Nowadays, classical Bayesian 
nets are used widely in Data mining, AI, etc. 

There exist only a small number of papers on quantum Bayesian nets. The 
first paper[8j on the subject appears to be mine. Since then, I have written several pa- 
pers applying quantum Bayesian nets to quantum information theory[9] and quantum 
computingpl]]. I have also written a Mac application called Quantum Fog|TT] (free- 
ware but patented) that implements the ideas behind quantum Bayesian networks. 
Laskey has also written some papers [12] about quantum Bayesian nets. 

It's known that any probability distribution can be represented by a Bayesian 
net, and also by a Markov net. It's known that any Bayesian or Markov net that 
represents a probability distribution, is logically equivalent to a set of conditional 
independencies satisfied by the probability distribution. 

In this paper, we show that the last paragraph is true if we replace probability 
distribution by density matrix. 

We also show that the d-separation theorems of classical Bayesian and Markov 
networks generalize in a simple and natural way to quantum physics. The quantum 
d-separation theorems are shown to be closely connected to quantum entanglement. 
We show that the graphical rules for d-separation can be used to detect pairs of 
nodes (or of node sets) in a graph that are unentangled. CMI entanglement (a.k.a. 
squashed entanglement) [13J, a measure of entanglement originally discovered by an- 
alyzing Bayesian networks, is an important part of the theory of this paper. 
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This paper is fairly self-contained; readers previously acquainted with quantum 
physics but not with classical Bayesian nets should have no trouble following this 
paper. Results about classical Bayesian nets are derived in parallel with those about 
their quantum brethren. The paper has pretensions of being pedagogical. 

2 Notation and Other Preliminaries 

In this section, we define some notation, and review various prerequisite ideas that 
will be used in the rest of the paper. 

2.1 General Notation 

As usual, Z, R, C will denote the integers, real numbers, and complex numbers, re- 
spectively. Let Bool = {0, 1}, = false and 1 = true. For a, b G Z such that a < b, 
let Z af) = {a, a+ l,...,b}. 

For any set J, let \ J\ denote the number of elements in J. 

For any set J, its power-set is defined as {J' : J' C J}. This set includes the 
empty set and the full set J. The power-set of J is often denoted by 2 J because 
|2 J | =2l J l. 

Let Sy = 5(x, y) denote the Kronecker delta function; it equals 1 if x — y and 
if x 7^ y. 

For any matrix M G C pxg , M* will denote its complex conjugate, M T its 
transpose, and = M* T its Hermitian conjugate. Let diag(xi, x 2 , ■ ■ ■ , x r ) denote a 
diagonal matrix with diagonal entries x±, x 2 , ■ ■ ■ , x r . 

For any z G C, phase(z) will denote its phase. If r, 9 G M, phase{re l6 ) = 
9 + 2ttZ. 

For any expression f(x), we will sometimes abbreviate 

/(a) = f( x ) / j\ 

^2xf( x ) numerator 

The abbreviation with the word "numerator" is especially helpful when f(x) is a long 
expression, and we want to write it only once instead of twice. 
For f u h GC, let 

X 

= fi x h ■ (2) 

This notation saves horizontal space: it allows us to indicate the product of two 
numbers with the numbers written in a column instead of a row. 

Given expressions A,B,X,Y, we will often say things like "A (ditto, X) is B 
(ditto, Y)"; by this, we will mean that "A is B" and "X is Y". 
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2.2 Classical Probability Theory 

and Quantum Physics Preliminaries 

Random variables will be denoted by underlined letters; e.g., a. The set of values 
(states) that a can assume will be denoted by Sta. Let N& = \Sta\ - The probability 
that a = a will be denoted by P(a = a) or Pa{a), or simply by P{a) if the latter will 
not lead to confusion in the context it is being used. We will use pd(Stg) to denote 
the set of all probability distributions with domain Sta. 

In this paper, we consider networks with N nodes. Each node is labelled by 
a random variable x-, where j G Z ltN . For any J C Z ltN , the ordered set of random 
variables x- Vj G J (ordered so that the integer indices j increase from left to right) 
will be denoted by (x.)j or Xj. For example, (x.){2,4} = ^{2,4} = (^2> ^4 

). We will 

often call the values that Xj can assume (x.)j or xj. For example, (x.){2,4} = X{2,4} = 
(X2, X4). We will often abbreviate (x.)z 1 N or x z by just (x.) or x. . We will often 
call the values that x. can assume (x.) or x. . 

In this paper, we will often divide by probabilities without specifying that 
they should be non-zero. Most of the time, this cavalier attitude will not get us into 
trouble. That's because one can always replace all vanishing probabilities by a positive 
infinitesimal e. Our results can then be expressed as a power series in e. As long as 
our inferences depend only on terms that are zeroth order in e, our inferences will be 
well-defined and unique as e tends to 0. There are, however, situations when dividing 
by a probability can be fatal. Such situations ultimately boil down to trying to infer 
something from terms that are first order in e; for example, when we erroneously 
conclude that Ae = implies A = 0. In the future, we will divide by probabilities 
without assuming that they should be non-zero, except in those cases when doing so 
is being used to infer something that becomes false when e — > 0. 

In quantum physics, a has a, fixed, orthonormal basis {\a) : a G Sta} associated 
with it. The vector space spanned by this basis will be denoted by Tia- In quantum 
physics, instead of probabilities P(a = a), we use "probability amplitudes" (or just 
"amplitudes" for short) A(a = a) (also denoted by Aa(a) or A(a)). Whereas P > 
and J2 a P( a ) = 1> l^| 2 ( a ) = 1- Besides probability amplitudes, we also use den- 
sity matrices. A density matrix is a Hermitian, non-negative, unit trace operator 
acting on Ha. We will use dm(TCa) to denote the set of all density matrices acting on 

Wa- 
if px G dm(Hx), Px,a e dm(Ttx,a), and p^ = tr & (p && ), we will say that p^ is a 
partial trace of p 2)fi , and p% & is a traced dm-extension of p^. Given a density 
matrix Px^x^x^,... £ d m {'Hx 1 ,x 2 ,x 3 ,...), its partial traces will be denoted by omitting 
its subscripts for the random variables that have been traced over. For example, 

: We will use random variables in both classical and quantum physics. Normally, random variables 
are defined only in classical physics, where they are defined to be functions from an outcome space 
to a range of values. For technical simplicity, here we define a random variable a, in both classical 
and quantum physics, to be merely the label of a node in a graph, or an n-tuple x K of such labels. 
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We will sometimes abbreviate \a)(a\ by proj(|a)). This abbreviation is espe- 
cially convenient when the label a is a long expression, for then we only have to write 
a once instead of twice. 

2.3 Graph Theory Preliminaries 

Next, we review some basic definitions from Graph Theory. 

A graph G is pair (V, E), where V is a set of nodes (vertices) , and E is a 
set of connections (edges) between some pairs of these nodes. (No self-connections 
allowed). A subgraph G' = (V',E') of a graph G = (V,E) is a graph such that 
V C V, and E' is defined as the subset of E that survives after we erase from E all 
edges that mention a node in V — V. 

We will abbreviate Directed Acyclic Graph by DAG. A DAG is a graph 
with arrows as its edges, and without any cycles. A cycle is a finite sequence of 
arrows that one can follow, in the direction of the arrows, and come back to where 
one started. The set of all possible DAGs with node labels x. will be denoted by 
DAG(xJ. ~~ 

We will abbreviate Undirected Graph by UG. An UG is a graph with (undi- 
rected) links as its edges. The set of all possible UGs with node labels x. will be 
denoted by UG(x). 

One can also define hybrid graphs that contain both arrows and undirected 
links[2j[lj, but we won't consider them in this paper. 

Consider a DAG whose nodes are labelled by x, . Any node x- has parent 
nodes (those with arrows pointing from them to x.) and children nodes (those with 
arrows pointing from x- to them). pa(j),ch(j) C Z ljN are defined as the sets of 
integer indices of the parent and children nodes of Xj. For example, in Fig{I](a), 
pa(4) = {2,3} and ch(l) = {2,3}. an(j),de(j) C Z^ N are defined as the sets of 
integer indices of the ancestor and descendant nodes of Xj. That is, an(j) = 
pa(j) Upa 2 (j) Upa 3 (j) U . . .. By this we mean that an(j) is obtained by taking the 
union of the integer indices of the parents of x,-, and of the parents of the parents 
of Xj, and of the parents of the parents of the parents of x,-, and so on. Likewise, 
de(j) = ch(j)Uch 2 (j)Uch 3 (j)U. . .. The set of integer indices of the non-descendants 
of Xj will be denoted by ->de{j) = Z liN — de(j) — {j}. The set of integer indices of 
the non-ancestors of Xj will be denoted by -<an(j) = — an(j) — {j}. Let 
= s(J) U {j} for s G {pa,ch,an,de,^de,^an}. In other words, we will use an 
overline over a set s(j) that does not include j to denote the "closure" set obtained 
by adding j to s(j). 

Next consider an UG whose nodes are labelled by x. . Any node x- has 
neighbor nodes (those with links between Xj and them). ne(j) is defined as the set 
of integer indices of the neighbor nodes of x,-. For example, in Figfjjb), ne(2) = {1, 4}. 
We will also use ne(j) = ne(j) U {j}. 
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For either a DAG or an UG, a path from node x to node y is a finite 
sequence of nodes, starting with x and ending with y, such that adjacent nodes in 
the sequence are connected. Note that for a DAG, the arrows in a path need not all 
be oriented in the same sense. If they are, we call the path a directed path. 

In a DAG, a path from x to y can have 3 (mutually exclusive and exhaustive) 
types of nodes. A serial node a equals one of the endpoints (x and y), or else, it is 
connected to its path neighbors in this 

- (a) - (3) 

or this 

- (a) - (4) 
manner. A divergence node a is connected to its path neighbors in this 

- (a) - (5) 

manner. A convergence (a.k.a. collider) node a is connected to its path neighbors 
in this 

- (S) - (6) 

manner. 

A DAG (ditto, an UG) is fully connected if it is impossible to add any more 
legal arrows (ditto, links) to it. A fully connected subgraph (of either a DAG or an 
UG) is called a clique. A clique for which there is no larger clique that contains it, 
is called a super-clique. For any graph G, we define super — cliques(G) (a subset 
of 2 Zl ' N ) to be the set of the super-cliques of G. For example, super — cliques(G) for 
both graphs in Figgis {{1, 2}, {1, 3}, {2, 4}, {3, 4}}. 





(a) (b) 
Figure 1: (a)An example of a Bayesian net. (b)An example of a Markov net. 



A classical Bayesian network is a DAG with labelled nodes (let (x)z 1 N be 
the labels), together with a transition matrix P{xj\x pa u\) associated with each node 
Xj of the graph. The quantities P(xj\x pa ^)) are probabilities; they are non- negative 
and satisfy P(xj\x pa (j)) = 1. The probability of the whole net is defined as the 
product of the probabilities of the nodes. 



6 



A quantum Bayesian network is a DAG with labelled nodes (let (x,)z 1 N be 
the labels), together with a transition matrix A{xj\x pa u\) associated with each node 
Xj of the graph. The quantities A(xj\x pa (j)) are probability amplitudes; they satisfy 
\A\ 2 (xj\x pa (j)) = 1. The probability amplitude of the whole net is defined as the 
product of the amplitudes of the nodes. For example, for the quantum Bayesian net 
of FigJTfa), one has 



A(xi, x 2 , x 3 , x 4 ) = A(x 4 \x 2 ,x 3 )A(x 3 \xi)A(x 2 \x 1 )A(xi) , (7) 

where Xj G St%. for j = 1, 2, 3, 4. 

A classical (ditto, quantum) Markov network is an UG with labelled 
nodes (let (xJz 1N be the labels), together with an affinity 4>(xk) (ditto, o>(xk)) 
associated with each super-clique K of the graph. The probability (ditto, probability 
amplitude) of the whole net is defined as the normalized product of the affinities of 
the super-cliques of G. For example, for the quantum Markov net of Figfjjb), one 
has 

, a(x4, x 3 )a(x4, x 2 )a(x 3l xi)a(x 2 , £1) /nN 
A(x 1 ,x 2 ,x 3 ,x 4 ) = — 



J2 X1>X2 , X3 , X4 \numerator\ 
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where Xj G St^. for j = 1, 2, 3, 4. 

We will sometimes use G to denote a Bayesian (ditto, Markov) network asso- 
ciated with a DAG (ditto, UG) G. 



2.4 Information Theory Preliminaries 

Next, we review some basic definitions from Information Theory [14]. 

First consider classical physics. For any P G pd(St^), the entropy (a measure 
of the variance of P) is defined by 

H(x) = -J2 P ( x ) ln P ( x ) • ( 9 ) 

Sometimes the entropy is denoted instead by H{P r£ ). CMI (usually pronounced "see- 
me") stands for "Conditional Mutual Information". For P G pd(Stx,y,z), the CMI (a 
measure of conditional information transmission) is defined by 

H(x :y\e) = J2 V, e) In ; ■ (10) 

x^e P{x\e)P{y\e) 

In general, H(x : y\e) > 0. When = 1, CMI degenerates into the mutual 
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information H(x:y). Note that 

H{x:y\e) = > Px,?/,eln — — 11a 

^ e P{x,e)P(y,e) 

= H(x,e) + H(y,e)-H(x,y,e)-H(e) . (lib) 
Classical CMI satisfies the chain rule 

H(x : y v y 2 \e) = H(x : Z/Jy^e) + H(x : yje) . (12) 
Now consider quantum physics. For p% G dm(Hx), the entropy is defined by 

S(x) = -tr^p^hipx) • (13) 

Sometimes the entropy is denoted instead by S(px) or by S p (x), where p is a traced 
dm-extension of px- For Px,y,e £ dm{7i 3Lt y^e)i the CMI is defined by analogy to 
Eq.flTTb]): 



S{X : y\e) = S(px,e) + S(py,e) ~ S(Px,y,e) ~ S(pe) . (14) 

In general, S(x : y\e) > (this is known as the Strong Subadditivity of quantum 
entropy). Sometimes the CMI is denoted instead by S p (x : y\e), where p is a traced 
dm-extension of Px, y ,e- When — 1, CMI degenerates into the mutual information 
S(x: y). Just like classical CMI, quantum CMI satisfies the chain rule 

S{x : y v y 2 \e) = S(x : y^y^e) + S(x : y 2 \e) . (15) 

Given p^y e dm{7ix,y) , the CMI entanglement (an information theoretic 
measure of quantum entanglement) is defined as 

E CMI (x:y) = 1 - inf (S Px ^(x: y\e)) , (16) 

Z Px,y,e€K - 

where the infimum (a generalized minimum) is taken over the set K of all density 
matrices Px,y,e £ dm(l-Lx,y,e) such that t?ePx,y,e = Px,y Sometimes, the CMI entan- 
glement is denoted instead by E CM1 (px, y ), or by E^ M \x : y), where p is a traced 
dm-extension of px.y CMI entanglement is also known by the less scientific name 
of "squashed entanglement". For more information about CMI entanglement, see 
Ref.[E]. 

If we apply the definition of CMI entanglement to the right hand side of 
Eq.CTol). we get 

S(x : y_ v yje) > 2E CM \x : yj + 2E CMI {x : yj . (17) 

Now we are free to apply the definition of CMI entanglement to the left hand side of 
the previous equation to get: 
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E CMI (x : y v y 2 ) > E CMI (x : yj + E CM \x : yj . (18) 

Eq. (|T8l) can be called super-additivity of the right side argument of E . 

Since entanglement is symmetric (i.e., E(x : y) = E(y : x)), there is also super- 
additivity of the left side argument E CMI . Eq. ffTS"]) can also be called the syn- 
ergism of entanglement, because the whole has more entanglement than the sum 
of its parts. If the inequality in Eq. ffTSj) were in the opposite direction, we could call 
it sub-additivity or anti-synergism. 



3 Meta Density Matrix and 

Purification of a Density Matrix 

In this section, we define meta density matrices, and purifications of density matrices. 
We show that any density matrix has a purification. 

A pure density matrix \i G dm(H x ) of the form p = proj(^ x . A{x.)\x.)) will 
be called a meta density matrix. If A(x.) is the full joint amplitude associated 
with a Bayesian or Markov network G, we will call /i the meta density matrix of 
the network G. 

Suppose J C Zi^n and J c = — J. Given a density matrix p G dm(TCxj), 
we will call any pure density matrix fx 6 dm(TC x ) such that tix JC (/i) = p, a traced 
purification of p. More generally, if p = fl(p.) where the operator Q is not a trace 
operator, we will call \x a generalized purification of p. 

Crucial to this paper is the well known fact that any density matrix has a 
traced purification. Next, we will present a proof of this fact. Our proof is a nice 
showcase of Bayesian net ideas and of our notation. 

Consider any p e dm(Hx). Let 

p = ^p(x,x')|x)(x / | . (19) 

x,x' 

Let M be the matrix with entries p(x, x'), where x G St x labels its rows and x' G St x 
its columns. M is a Hermitian matrix so it can be diagonalized. Let M = UDU\ 
where U is a unitary matrix, and D is a real, diagonal matrix. Set U x j = A(x\j) and 
D jd = \A\ 2 (j), where Nj = N x . Then 

p(x,x') = J2Mx\3)\A\ 2 (j)A*(x'\j) . (20) 

j 

If we define 

ii = Y f A ( x \i) A U)M)> ( 21 ) 

then 
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p = trj proj(/i) . (22) 
A Bayesian net representation of the previous equation is 

P = (2) «-(£) • (23) 

The tr over the j is intended to indicated that node j should be traced over. Note 
that the eigenvectors of p become the transition amplitudes of node x, whereas the 
square root of the eigenvalues of p become the amplitudes of node j. 

4 Measurements of the Meta Density Matrix 

We've shown that any density matrix p has a traced purification p. Thus, without loss 
of generality, we need only consider meta density matrices p and those density ma- 
trices obtained by applying measurement operators to p. In this section, we describe 
a "complete" set of measurement operators that can be applied to a meta density 
matrix to obtain all measurable probabilities codified within it. 

First, consider classical physics. In particular, consider N random variables x, 
described by a probability distribution P(x.). Suppose 

U Z sum , (24) 

where Z vis and Z sum are disjoint sets. Here "vis" stands for "visible" and "sum" for 
"summed" . The probability that x Zv . = xz vis is defined as 

P{*z via ) = J2 P ^ ■ ( 25 ) 



X Z S 



Z vis can also be spilt into two parts. Let 

Z v is = Zpost U Zp re , (26) 

where Z post and Z pre are disjoint sets. The conditional probability that x Zpost = xz post 
given x 7 = x z is defined as 

Df I \ P i X Z p0 st 1 % Zpre) /0^7\ 

P{Xz po Jx Zpre ) = p ^ j • (27) 

The conditional expected value (a.k.a. conditional expectation) of any complex valued 
function f(-) of the random variable x 7 is defined as: 

E[f(x Zv J\x Zpre = x Zpre ] = f( x ZvJ p ( x z po Jx Zpre ) ■ (28) 

X Z 
'-'post 

Visible (either pre or post viewed) and hidden nodes will be indicated on a 
Bayesian network by the node decorations show in FigfS] 
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visible 


post-view 

pre-view ^>'|'~~~^J' 


summed ^ 
(hidden) --""^"'e 



Figure 2: Various node decorations, used with both classical and quantum Bayesian 
networks, to indicate visible and hidden nodes. In a probability P(a\e) = 
^ h P(a, h\e), h is hidden, a, e are visible, e is pre- viewed and a is post- viewed. 



Next, consider quantum physics. In particular, consider N random variables 
x, described by a pure state 

\<t> m eta) = ^A( X .)\x.) , (29) 
x. 

or, equivalently, by the meta density matrix 

A* = \4>meta) (4>meta\ = proj ( | (fimeta) ) • (30) 

Our next goal is to generalize the classical physics definitions Eqs. fl24j) to (|28|) to 
quantum physics. Let 

^l,N Z v i s U Z sum , Z sum Z Asum U Zp sum , (31) 

where Z vis , ZAsum and Zp sum are disjoint sets. Here "Asum" stands for "amplitude 
summed" and "Psum" stands for "probability summed". The probability that x 7 = 
xz - is defined as 

^{ x Z vis )\x ZD - 7 • 

•—^psutti \ numerator 

£-~dX Z 

vis 

Note that, contrary to the classical physics case, this probability depends on which 
random variables are summed coherently (A summed) and which are summed inco- 
herently (P summed). We've indicated this dependence by the subscript \x Zp . 
The backslash in this notation is intended to evoke a mental picture of the diagonal 
of a matrix, because the variables that are P summed are "diagonalized" ( why we 
say these variables are diagonalized will become clear to the reader later on, once he 
sees Eq.f l47b|) ). As in the classical physics case, let 
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Zyis Zpost U Zp re , (33) 

where Z post and Z pre are disjoint sets. The conditional probability that x Zpost = x Zpost 
given x Zpre = xz pre is defined, in analogy to the classical physics case, by 



„, , , P\ x Z po3t i Xz pre )\x z 

Consider a Hermitian operator fl x acting on TC X . Suppose {\xz ■ ) '■ Vx Zvi3 } are 
the eigenstates of VL X , so that 

Q £z vls = X ^ ma \xz ms )(x Zvts I • (35) 

Xz - 

vis 

In analogy to the classical physics case, one defines the conditional expected value of 
tt x? . by 



E i n x z . \x z = x Zp J\x = X *z vl3 p ( x z post \xz p J\x ■ (36) 

— ^vis t" L 1 — ^Psum * * vzs ' * — ^Psum 



At this point, we have achieved our goal of generalizing the classical physics 
definitions Eqs. (l24"]) to (1281) to quantum physics. In doing so, we've introduced the 
probability P(xz poat \xz pre )\ x „ ■ The rest of this section will be devoted to explain- 

^ P 'sum 

ing how this probability can be measured. 

To measure P(x Zpoat \x Zpre )\ X7 instead of P(x Zvta )\ x „ , one restricts the 

^Psum ^Psum 

range of the random variable x z to the single value xz pre - Of course, one must 
also divide ( "normalize" ) the restricted meta density matrix by a constant so that its 
trace remains 1. Next, we show how to measure P(x Zvis )\ x 

mS Zpsum 

Note that P(xz )\ x „ given by Eq. (l32l can be expressed as the expected 

vis \ — Zp surn 

value, in the meta density matrix /i, of a projection operator 



P{*z w, = ^ — ■ 37 

8 x - z Psum \ numerator 

■^vts 

The projection operator consists of a product of 3 mutually commuting 

projection operators defined by 

ttW = projflaO) , (38) 



7T 



(*>) 



W oj(\AV x ZAsu J) , (39) 



and 
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ttW = Wo]{\x Zpsum )) . 



(40) 



In 7T^, we use the "average" state vector \AV Xj), for J C Z\,n- This vector is 
defined as 



\AVxj) 



J2 XJ M 



(41) 



The fact that P(xz v )\x 7 can be expressed as an expected value of a projection 
operator suggests one way of measuring it. 



x = x 



eL 



e (■) 

X = X 



eZ x (.) 



tr 



tr,C)=^diag (•) 



diag (•) 



diag 



Figure 3: Various node decorations used with quantum Bayesian networks to indicate 
operators acting on the meta density matrix associated with the network. 



Suppose Qx,y is an operator acting on Hx,y It is convenient at this point to 
define the following super-operators acting on 



ex=x(Qx, y ) = {x\Qx, y \x) , (entry) 



(42) 



eEJfi^y) = 22(x\ftx,y\x') , (entry sum) (43) 

x,x' 

tix(£lx t y) = 22{x\Qx,y\x) , (trace) (44) 

X 

diag x (flx, y ) = ^JI^K^I (x\Qx, y \ x ) ■ (diagonal matrix) (45) 
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We've shown in parenthesis on the right hand side what we call these operators. Note 
that dia,g x ttx,y diagonalizes Vt^ y partially. &id,g xy VL^ y diagonalizes it fully. 1 Note 
that uj diag a = u for u = diag a , tr^, ea =a . On the other hand, 

eE„diag„ = tr„ . (46) 

Fig E] gives node decorations that will be used to indicate these operators when acting 
on a Bayesian network. 

In Eq.(157j). we obtained P(iz . )u as an expected value of a projection 

— ^Psum 

operator. Alternatively, P(xz )\x 7 can be obtained by successive applications of 

vis \ — Zp surn 

the operators e(), eS(), tr(), and diagQ to fi: 

&x 7 =17 . sSjj tr™ n 
P(x Zi )\ x = — — Asum Psum (47a) 



Enumerator 
&x 7 =x z . e ^x 7 ,x 7 diag /i 

—^vis "vis —^Asum—^Psum — z Psum 

V)„ numerator 



(47b) 



Eq. (|47bp follows from Eq. fT46l) . Here, the operators e(), eS(), tr(), and diagQ can be 
interpreted as measurements (or lack thereof) of the density matrix they act uponj^l 
In Eqs. (I47p . tr Xz means observe (=measure) the random variable x z , 
and then forget the outcome. e r - x „ means measure of the random variable x 7 
once. eS r means do no observe the random variable x 7 .It remains for us to 

—Zsum — &sum 

interpret diag^. clS 8b measurement. 

" Psum 

For any density matrix p xy e dm(TCx,y), the operator diag x is what is called 
a von Neumann measurement. It can be implemented physically in two steps: (1) 
measure the random variable x; if the outcome is x, emit and (2) repeat the 

measurement many times, without discriminating on any of the outcomes (mathe- 
matically, this corresponds to summing over the outcomes x of the measurements). 

A second way of implementing diag^. is as follows. The Bayesian net 

(x) <- (y) (48) 



2 Previously, we defined diag(-) to be a function that takes a vector x and returns a diagonal 
matrix with x along its diagonal. Here we are defining a different diag(-) function. Both of these 
functions return a diagonal matrix, but they have different domains. We will use the symbol diag(-) 
for both of these functions. Which function we mean will be clear from the context. 

3 The software program Quantum Fog can calculate P{x z post \ x z pr<1 )\x z numerically. Condi- 
tioning on x z = xz rc is already implemented in the current version, 2.0, of Quantum Fog; it 
corresponds to allowing only one "active" state for each of the nodes Xj for j £ Z pre . On the other 
hand, only a special case of the distinction between P-summed and A-summed is implemented in 
version 2.0. In version 2.0, x z is always assumed to equal the set of external nodes minus the 
set of visible ones. More general sets x z will be implemented in future versions of Quantum 
Fog. 
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with transition matrix A(x\y)A(y) can be replaced by a Bayesian net 

{d) - (x) - (y) (49) 

with transition matrix A(x'\x)A(x\y)A(y), where A[x\x') = e t9x S(x',x) \/x,x' G S^. 
Assume that the variables {8 X : Vx} are i.i.d. (independent, identically distributed) 
classical random variables, and each is uniformly distributed over [0, 2tt]. Let an 
overline denote an average over these variables. An effect of adding the node x[ to 
the network is that we must replace 

Ps>v = Pxy,x'y'\xy)(x'y'\ (50) 

x,y,x',y' 

by 

Px s ,y= Pxy, x > y >e ie *\xy)(x'y'\e- i9 *' . (51) 

x,y,x',y' 

Clearly, 

J^Ty = diagxP&y , (52) 

and0 

t%[fi PxeJ = tijp, p^\ = tVxp, diag^J , (53) 

for any operator Q acting on TCx,y Thus, the operator diag x can be implemented 
physically merely by taking many measurements for which 8 X varies randomly. 

A third way of implementing diag^. is by adding an additional node that is 
traced over. For example, suppose p% e dm(Hx) can be expressed in the form 

Px = diag £ (/i) , p = proj(^ A(x)\x}) . (54) 

We can introduce a node j such that Stj = St^ and A(j) = A(x = j). Then 

Px = tr L (p) , p = proj(J^ 8% A(j)\x,j)) . (55) 

x,j 

p is a generalized purification of p x whereas p is a traced one. By expressing p x in 
terms of p instead of p, we get rid of the diag x operator at the expense of adding 
an additional node j that we trace over. A Bayesian network representation of the 
essence of Eqs.( l54"]) and (|55|) is: 

4 Of course, for an arbitrary polynomial function /, one has f(p x e, y ) ^ f{p x <>,y): but this is not a 
show stopper, since the density matrix only enters linearly in the formula for the expected value of 
any observable. 
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diag tr 

(x) = (x) • (56) 

As a more general example of this method of implementing diag^,, suppose p V)X _ £ 
dm(TCy t x) can be expressed in the form 

p h x = diagjji) , n = proj(J^ A(y\x)A(x)\y,x}) . (57) 

y,x 

Once again, introduce a node j such that Stj = Stx and A(j) = A(x = j). Then 

py,x = tTj{fi) , p, = proi(^2 A(y\x)5^A(j)\y,x,j)) ' ( 58 ) 
A Bayesian network representation of the essence of Eqs. fl57|) and fl58l) is: 

diag tr 

(y) <-(z) = (y) <- (x) <-(j) . (59) 

The Schmidt Decomposition is very popular in the Quantum Information The- 
ory literature. As an illustration of the use of the entry-sum operator e£, let us con- 
sider the Schmidt Decomposition from the point of view of Bayesian networks. The 
Schmidt Decomposition is the statement that given a pure state pi £ dm^Hx.y) of 
the form 

fii = proj(^ A(x,y)\x,y}) , (60) 

x,y 

the coefficients A(x, y) can be expressed in the form 

A(x,y) = J2Mx\j)A(y\3)Mj), (61) 

j 

where A(j) > Vj, £\ \A(j)\ 2 = 1, £ x \A(x\j)\ 2 = 1 Vj, E„ l^%b)| 2 = 1 Vj. 

The fact that any A(x, y) can be expressed in the form given by Eq. fl6ip is a 
re-statement of the Singular Value Decomposition Theorem. This is why. Let M be 
the matrix with entries A(x, y), where x £ Stx labels its rows and y £ St y its columns. 
According to the Singular Value Decomposition theorem, M can be expressed in the 
form M = UDV\ where U and V are unitary matrices and D is a non-negative, 
diagonal matrix. If we let U x j = A(x\j), D jt j = A(j), V*j = A(y\j), then Eq.flBTl) 
follows. 

To obtain a Bayesian net picture of the Schmidt Decomposition, note that if 
we define fi2 £ dm(Hx, y ,j) by 

P2 = VT0i(%2A(x\j)A(y\j)A(j)\x,y,j)) , (62) 
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then 



e£y(p 2 ) = proj(J^ A(x\j)A(y\j)A(j)\x,y}) 



1*1 



(63a) 
(63b) 



x 



eZ 



Figure 4: Bayesian net representation of the Schmidt Decomposition, as given by 
Eq. (l63bl) . 



Eq. fl63b|) is illustrated by FigJU 

Eq. fl63bp gives an example of the use of the entry-sum operator eS. Note that 
this operator takes a pure state of tensor rank n > 2 into a pure state of tensor rank 
n — 1. Indeed, 



eLa proj(^ A{x,a)\x,a)) = proj(^ A(x, a)\x)) . 



(64) 



eS also takes a pure state of tensor rank n — 1 into a non-negative number. Indeed, 
for m e H a , 



(65) 



Note that when Ng_ = 1, the entry-sum operator eE^ equals the entry operator 
ea= a - Thus, ea =a can be viewed as a special case of eS^. It's clear that 6a= a inherits 
from eS„ the property that: it takes a pure state of tensor rank n > 2 into a pure 
state of tensor rank n — 1, and it takes a pure state of tensor rank n — \ into a 
non-negative number. 

Suppose /x G dm(TCx Zi ) is a pure density matrix, and p is a density matrix, 
and p = (rijeJ^)/ 2 ' wnere ^ C Z 1)JV and u^. G {' ,-, (>:,. . t r, . diag,. }. We've 
shown that Cx= Xj , and eS^. both take a pure density matrix to another pure den- 
sity matrix, so one can easily find a pure density matrix p' G dm(7i x ) such that 

- Z 1,N' 

p = {Ylj£j>Vx ■)(*', where J' C Z\^> and uj^. G {t%., diag^. }. We've shown that each 
operator diag^, can be traded for an extra node that is traced over. Thus, one can eas- 
ily find a pure density matrix p" G dm(Hx z /; ) such that p = dljeJ" ^ r ^)/ i "; where 
J" C Zi t N". To summarize, given a generalized (i.e, made with entry, entry-sum, 
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trace and diag operators) purification of p, one can easily find a traced purification of 
p. A generalized purification of p might be convenient for certain purposes, but not 
for others. Luckily, it can be easily replaced by a traced one. 



5 Conditional Amplitudes 

In this section, we define conditional amplitudes. These are a natural generalization 
of conditional probabilities. 

Consider a meta density matrix p = proj(X) x A(x.)\x.)) Its complex amplitude 
A(x.) can be parameterized as 

A(x.) = e^P^{x.) , (66) 

where the 6{x.) are real and P e pd(St x ). Choose an arbitrary state of (xj, and call 
it the reference state (x.°) . It is convenient to constrain 9(x.) by assuming that 
it vanishes at the reference state: 

6{x.°) = . (67) 
For J C Zi t N and J c = Z\,n — J, we define 

9{xj) =9(xj,x° JC ) , (68a) 

P(xj) = J2 P ( x -) ' ( 68b ) 

Xjc 

and 

A(xj) = e ie{xj) P^{xj) . (68c) 
For disjoint sets J±, J 2 C Z 1N , we define 



9(x Jl \xj 2 ) = 9(x Jl ,xj 2 ) - 6{xj 2 ) , (69a) 
P(x j2 ) 



P(xj 1 \xj 2 ) = P{ *-! 1 ' X ? ) , (69b) 



and 



A(x Jl \xj 2 ) = A{x A ?' X ? ) ■ (69c) 



A(xj 2 ) 
Note that 

9(x° Jl \x j2 ) = 0, (70) 



18 



phase((x.\fi\y.)) = 9(x.) — 6{y.) , (71) 

and 

9(x.) = phase((x.\jj\x.°)) . (72) 

6 Probabilistic Conditional Independence 

This section, divided into 3 subsections, explores the notion of conditional indepen- 
dence in both classical and quantum physics. 

Henceforth, by an independency, we will mean a triplet (x_j _L x K \x E ), 
where J, K, E C Zi^ are disjoint. (If J and K are disjoint but overlap with E, 
replace (x_j _L x K \x E ) by (x_j_ E _L x K _ E \x E )). If the sets J and K both contain more 
than one element, we will call it a global independency. If | J\ + \K\ + \E\ = N, 
we will say that ( X j _L X | X ) is dbTi all- encompassing independency. We will 
use the word I-set as an abbreviation for "independencies set"; that is, a set whose 
members are independencies. It is convenient to introduce a symbol for the set of all 
possible independencies: 

1{x) —{1:1— (x_j _L x K \x E ); J,K,Ec Z^ N are disjoint } . (73) 

6.1 Types of Probabilistic Conditional Independence 

In this section, we define classical conditional independence and three quantum ana- 
logues of it, type-A, type-CMI, and type-CMI' . 

Consider first classical physics and probability. Let J,K,Ec Z\j$ be disjoint 
sets. We say x_j and x K are conditionally independent given x E iff 

P(xj,x K \x E ) = P{xj\x E )P(x K \x E ) \fxj,x K ,x E . (74) 
Eq. fl74l is clearly equivalent to requiring that 

) = P{xj\x E ) , (75) 

or 

P(x K \xj,x E ) = P(x K \x E ) ■ (76) 

We define the function r p : T(x) — > Bool by the statement: 

true iff Eq. (1741) is true. Think of "truth function" that decides whether its 

argument is false=0 or true=l. 

In classical physics, conditional independence and vanishing CMI are equiva- 
lent. Indeed, 
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Theorem 1 

H(x : y) = iff P(x, y) = P(x)P(y) Vx, y , (77) 

and 

#fe:]/|e)=0 iff P(x,y|e) = P(x\e)P(y\e) Vx,y,e . (78) 

proof: The proof can be found in Ref . [14] . 
QED 

Now consider quantum physics. Our goal is to find the quantum counterpart 
of Eq. fl74|) and Theorem [TJ Consider a meta density matrix [x = proj(^ x A(x.)\x.)). 
Let J, A, E C Zi )N be disjoint sets. We say Xj and type-A conditionally 

independent given x E iff 

A(xj,x K \x E ) = A(xj\x E )A(x K \x E ) Wxj,x K ,x E . (79) 
We say x 3 and type-CMI conditionally independent given x E iff 

Sp(xj ■ x K \x E ) = . (80) 

(Note that we trace over all random variables x n such that n ^ J U K U E). We say 
Xj and x K are type-CMI' conditionally independent given x E iff 

5 d iag £B ( M )(^j : x K \x E ) = . (81) 

We define the function r A : I{x) — > £?ooZ by the statement: r A (x_j _L x^lx^) is 
true iff Eq. (J79J) is true. Likewise, r CMI {xj JL x K \x E ) iff Eq. (|80]) . Likewise, t omi '(xj 1 
x K \x E ) iff Eq.flHTJ). 

In classical physics, type-A and type-CMI conditional independence are equiv- 
alent, but in quantum physics, neither one implies the other. We will give counterex- 
amples of this later. But first, we will give easy-to-check necessary and sufficient 
conditions for a vanishing quantum CMI. 

Theorem 2 For p% y G dm(T~Lx,y), 

Sp*y{x : y) = iff p^, = p^p y _ , (82) 
where p% and p y are partial traces of p m . For p^ye £ dm(TCx,y,e) , 

S Pms {x : y|e) = iff = £ |e>(e|ti>(e)p£ >p« , (83) 

e 

where w(-) G pd(Ste), and, for all e, px 1 G dm(Hx), p y G dm(Tt y ). 

proof: Eq.( l83i) implies Eq.( l82]) . Proving for Eq.(l83l) is a simple calculation. It 
was pointed out in Ref. [To] . Proving =>- for Eq. (l83|) is much more technical. A 
weak version of it was proven in Ref. [15] . The strong version presented here was first 
proven in Ref.|16j. 
QED 
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Theorem 3 Consider a meta density matrix p = proj(^ a , A(x.)\x.)). Suppose 
Ki,K 2 ,E C Zi n are disjoint sets, U = K\ U K 2 U E, and U c = Z\n ~ U. Let 



I — {x Kl -L x K2 \x e ). 

-CMI' 



r (I) ijfy(x E ,x Kl ,x K2 ,x Klil x K ^ 



E 



A(x Ki} Xk 2 ,Xuc } Xe) 

A* [x Kl , x' K2 , xjjc , xe) 



w{x E )p { i E \x Kl ,x' Ki )p 2 XE) (x K2 ,x' K2 ) , (84) 



where w(-) G pd(St~), and where , for j G {1,2}, Vxe, pf G dm{7ix K )- 



proof: 



Define p by 



P 



if-, if 9 



A(x K 

1 5 ^^2 5 Xtj c , Xe) 
A* { x ' Kl , X K% , X[/c , O^) 



(85a) 



\x K\X k 2 ) (Xki X 



K 2 \ ■ 



(85b) 



r CMI (I) is equivalent to S p (x K : x^ 2 |a; E ) = 0. 
Recall that for any p so \ G dm(Hx ), 



S Psot (x Ki : s^Jxe) = iff p sol = \xe)(xe\w(x e )p^ i ) P^ ■ (86) 

xe 

(=^) By setting p equal to p so ;, we prove Eq. (|84l) . 

(<^=)By plugging Eq. (l84"l) into Eq.( l85bl) . we show that p satisfies the right hand 
side of Eq. (l86l) . so it satisfies the left hand side of the same equation. 
QED 

We are finally ready to prove that for type-A and type-CMI conditional inde- 
pendence, neither one of these implies the other. 

Theorem 4 Suppose Ki,K 2 ,E C Z\^ are disjoint sets, and I = (x Kl _L x K2 \x E ). 
r CMI (I) =£> t a (I) and t cmi (I) & t a '(I). Also, r CMI \l) t a (I) and r CMV (I) <f= 

T A (I). AISO, T CMI (I)^T CMI '(I). 

proof: Let U = K\ U K 2 U E, and U c = Z\^ — U. For our counterexamples, we 
will assume xjc x — > x\, xk 2 — * x 2 , xyc — > a, where x\,x 2 ,a are Boolean variables. 
We will take N% = 1, and indications of any dependence oni £ will be suppressed. 
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We will take (0,0,0) to be our reference state (i.e., the state (x°,X2,a°) for which 
, x%,a°) — 0). We will abbreviate 8(xi,X2,ct) by 9 xii x 2 ,a- 

t cmi (I) =>- r CMI (I) is obvious. Since we will assume Nx E = 1, our example 
of r CMI '(I) =£► r A (I) will also prove r CMI (I) =fr r A (I). Likewise, our example of 
r CMI '(I) & t a {I) will also prove r CMI (I) <t r A (I). 
(proof of t cmi ' & t a ) Assume 



rl,l e i0 x-i,x 2 ,a 



A(xi,x 2 , a) 

Xi , X2 ,a = tfltLa > where e e M - 2ttZ 
This X2, a) satisfies 



(87) 



E 



A(xi,x 2 , a) 
A*(x 1 , x 2 , a) 



33 ]_ ^CC CC^^CCr^ 



Therefore, t cmi '(I) is true. This A(xi,x 2 ,a) also satisfies 



5 1 ' 1 



Hence 



A(x 1 ,x 2 )=6£ >X2 e i0 ™ = 

^(Zl) = ^e* 100 = <£i 

A(x 2 ) = SLe»™ = SL 



A( Xl ,x 2 ) ± A( Xl )A(x 2 ) , 



which means t a (I) is false. 

(proof of t cmV t a ) Assume 



(89) 



(90) 



A(x 1 ,x 2 ,a) 



This A(xi,X2,a) satisfies 
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tfltLa , where (6l - 27rZ 



(91) 



v 7 ! 



A(xi,x 2 ) 

A / \ £>" V X] UU I 

A(s 2 ) - e ^ 2 ° 



1 

7i 



1 



Therefore, 



(92) 



A(xi,x 2 ) = A(xi)A(x 2 ) , 
which means t a (I) is true. This A(x 1 ,x 2 ,a) also satisfies 



(93) 



E 



A(xi,x 2: a) 

^4 *^2' ^0 



1 »£[<ki.*2-<U _/ 
1 + e i-p^ 



(94) 
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Let's show that assuming r CMI '(I) leads to a contradiction. Theorem [3] implies (i) 
and Eq. (l9i!) implies (ii) in the following: 



phase[pi(0, 0)p 2 (l, 0)] = phase( s ^ j 



4(0,1, a) 
A*{0,0,a) 



(U) 



(95) 



and 



phase[pi(l, l)p 2 (l) 0)] = phasejS^ 

a 

Since pi(0, 0) and pi(l,l) are supposed to be real, the right hand sides of the two 
previous equations are supposed to be equal. They aren't — a contradiction. 
QED 

There is, however, one subset of over which r A and r CMI agree. 

Theorem 5 Suppose Ki,K 2 ,E C Z\,n are disjoint sets, and I = (x Kl -L x K2 \x E ). 
If \K X \ + \K 2 \ + \E\ = N, then r A (I) = r CMI '{I). 

proof: According to Theorem [31 t cmi '(I) is equivalent to: 



4(1,1, a) 
A*(l,0,a) 



(a) 



phase(l + e^) 



(96) 



= w(x E )pT E \x Kl ,x' Kl )p 2 XE \x K2 ,x' K2 ) , (97) 

where w(-) is a probability distribution, and for j = 1,2, Vx E , pf E ^ are density 
matrices. t a (I), on the other hand, is equivalent to 

A(x Kl ,x K2 ,x E ) = A(x Kl \x E )A(x K2 \x E )A(x E ) . (98) 

Clearly, t a (I) implies t cmi '(I). To show that r CMI '(I) implies r A (I) , define 9(x.) = 

phase(A(x,)), \A\(x E ) = y/w(x E ), and, for j = 1,2, |A|(^.|x£;) = yJpf B \x Kj ,x Kj ). 
QED 

6.2 Reduction and Combination Rules for Independencies 

Consider the following reduction and combination rules for independencies: 

(a) (Decomposition/2 — > 1) 

r v (x 1 y v y 2 \e) t*(x 1 y 2 \e) 

(b) (Weak Union/2 -> 1') 

r r \x 1 y v y 2 \e) => r"(x _L y^y^e) 



A(x Kl 

4 (^Ki ' x k 2 x e) 
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(c) (Contraction/1', 1 -> 2) 

r v (x 1 y^y^e) and r^x _L y 2 \e) => r^x ± y_ v y 2 \e) 

(d) (Intersection/1', 1' 2) 

P 7^ and r^x 1 yj^, e) and r^x 1 yjy^e) =>• r r '(x 1 

The function r' 7 : X(x.) — > Bool remains to be specified, x, y^y^e stand for mutually 
exclusive n-tuples of the form x K for some K e Rules (a) and (6) perform a 

"reduction" whereas (c) and (d) perform a "combination". 

An independency I = (• J_ •]•) has 3 slots. In the above rule statements, 
we've denoted all random variables in the second slot (slot-2) by the letter y with a 
subscript. 

The above rule statements start with the rule name, in parenthesis. Within 
the parenthesis, to the left of the slash is the name given by Judea Pearl in Ref . [5] . To 
the right of the slash is a new name, first given in this paper. In the new rule names, 
the symbol — > stands for implication, and there is one number, indicating the number 
of y's in slot-2, for each independency. For example, in rule 1, 1' — > 2, there are: one 
y in slot-2 of the first independency, one y in slot-2 of the second independency, two 
y's in slot-2 of the third independency. The prime in 1' indicates that, besides there 
being one y in slot-2, there also is one y in slot-3. 

Note that in rule (d) above, we specify that P ^ 0. That's because, as we 
shall see, this rule arises from one of those unusual cases, mentioned earlier, in which 
dividing by a probability causes trouble. Later on, we will state and prove theorems 
whose proof assumes rule (d). The fact that such theorems assume rule (d) will show 
up in that they inherit P ^ as one of their premises. 

Next we will show that the reduction and combination rules are obeyed by r A 
and r CMI . 

Theorem 6 The above reduction and combination rules are true in classical physics 
with 7] = P. 

proof: The classical CMI satisfies 



hi hi hz 




Permuting y\ and 7/2 m the previous equation yields 



hi h 5 h e 




Recall that the CMI is non-negative. 

• proof of (a)(2 -> 1):^ = h 3 = 0. 

• proof of (b)(2 -> l'):hi = h 2 = 0. 
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• proof of (c)(l', 1 -> 2):h 2 = h 3 = => h x = 0. 

• proof of (d)(l',l' — > 2): We want to show that h 2 = h 5 = 0=>hi = 0. Why- 
would this be? h 2 = and h 5 = imply, respectively, 

P(x, yi, jfy, e) = P(x\y 2 , e)P(y 1 \y 2 , e)P(y 2 , e) , (101a) 

and 

P(x, y u jfy, e) = P(x\yi, e)P(y 2 \y u e)P(y t , e) . (101b) 

We can equate the right hand sides of the two previous equations, and then 
divide both sides of the resulting equation by P(yi, y 2 , e) (here we use P^O). 
This yields (i) below. Since we can vary y\ and y 2 independently in equation 
(i) below, equation (ii) follows. 

P(x|y 1} e)i P(x\y 2 ,e) { ^ P(x\e) . (102) 
Combining Eqs. fllOlal) and f)102p then yields, 

P(x, yi, y 2 , e) = P(a;|e)P(^i J y 2 , e) , (103) 

which, in turn, yields 

P(x, yil y 2 \e) = P(x\e)P(y u y 2 \e) . (104) 

QED 

Theorem 7 The above reduction and combination rules are true in quantum physics 
with 7] = A. 

proof: 

• proof of (a) (2 — > 1): The premise is that 

A(x, yi,y 2 \e) = A{x\e)A{y l: y 2 \e) . (105) 

Eq. fll05p implies 

P(x, y u y 2 \e) = P(x\e)P(y u y 2 \e) . (106) 
Summing both sides of the previous equation over y\ yields 

P(x,y 2 \e) = P(x\e)P(y 2 \e) . (107) 
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Eq. fll05p also implies 



9{x,y 1 ,y 2 \e) = e(x\e)e(y 1 ,y 2 \e) . (108) 
If, in the previous equation, we set y\ to its reference state y°, we get 

9(x,y 2 \e) = 6{x\e)9{y 2 \e) . (109) 
Combining Eqs. dTOTl) and (TT091 yields 

A(x,y 2 \e) = A(x\e)A(y 2 \e) . (110) 

• proof of (b)(2 — > 1'): One has 

(i) (ii) 

A(x\yi,y 2 ,e) = A(x\e) = A(x\y 2 ,e) . (Ill) 

(i) follows from the premise t a (x _L y 1 ,2/ 2 |e). Plugging the premise into rule 
(a)(2-l) gives (ii). 

• proof of (c)(l ; , 1 — > 2): One has 

A{x\y 1} y 2 , e) = A(x\y 2 , e) = A(x\e) . (112) 

(i) follows from the part r (x _L y l \y 2 ,e) of the premise, (ii) follows from the 
other part t a (x _L y 2 |e) of the premise. 

• proof of (d)(l ; , V — > 2): The premise is that 

A(x\y u y 2 , e) = A(x\y 2 , e) , (113) 

and 

A(x\y u y 2 , e) = A(x\y ll e) . (114) 
Thus, if A{x,y x ,y 2 ,e) ^ 0, 

A(x\y 2 , e) = A(x\y 1 , e) = A(x\e) . (115) 
Combining Eqs. (lll3p and (11151) now yields 

A{x\y u y 2 ,e) = A(x\e). (116) 

QED 

Exercise for reader: Find out whether T sepG ,r CMI and r CMI satisfy the re- 
duction and combination rules. 
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6.3 Probabilistic I-sets 



In this section, we define certain probabilistic I-sets; that is, I-sets whose members 
are defined in terms of a probability distribution (or a meta density matrix). 
First consider classical physics. For any P G pd(St x ), define 

T{P) —{1:1— (xj _L x K \x E ); J,K,Ec Z 1>N are disjoint; r p {I)} . (117) 

Next consider quantum physics. For any meta density matrix /i G dm(7i x ) of 
the form /i = proj(^ 3 , A(x.)\x.)), let 

1(A) = {I : I = (x_j J_ x K \x E ); J,K,E C Z 1)N are disjoint; t a (I)} . (118) 

For 7] = P, A, when we say that an I-set I is satisfied by r], we will mean 
that r T '(/) for all I E 1 (or, equivalently, X C 1{rj)). 

7 Bayesian Networks 

In this section, we show that any probability distribution can be represented by a fully 
connected DAG. We also show that any quantum density matrix can be represented 
by a fully connected DAG. In classical and quantum physics, omitting certain arrows 
from this fully connected graph indicates certain probabilistic independencies. 

7.1 Chain Rule and 

Factorization According to a Graph 

In this section, we define a chain rule and factorization according to a DAG, both for 
classical and quantum physics. 

First consider classical physics. Let P 6 dm(7i x ). For iV = 3, the P chain 

rule is 




7 4 2 



We have indicated under each conditional probability the number of degrees of free- 
dom that it holds, assuming that xi, X2, £3 G Bool. For arbitrary N, the P chain rule 
is 

N 

P(x.) =J[P(x j \x ZlJ _ 1 ) . (120) 
i=i 

Now consider quantum physics. Suppose fi G dm(TL x ) is a meta density matrix 
of the form fi = proj(^ x A(x.)\x.)). In analogy to Eq. (II 191) . we would like the A 
chain rule for N = 3 to be 
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A(x 3 ,x 2 ,x l ) = A(x 3 \x 2 ,xi)A(x 2 \xi)A(xi) . (121) 

The P chain rule Eg. (11191) was stated without proof, because the equation is 
well known, and very easy to prove. On the other hand, the A chain rule Eq. (112 ip is 
new, so we prove it next. 

From various definitions in Section 0, we get 

9(x 3 \x 2 , xi) = 9(x 3 , x 2 , xi) - 9(x 3 , x 2 , X\) , (122a) 

Q(x 2 \xi) = 9(x 3 , x 2 , x\) — 9(x 3 , x 2 , xi) , (122b) 

and 

9{x x ) = 9(x°, x° } x x ) - 9(x°, x°, x° x ) . (122c) 

Summing Eqs. (11221) (more precisely, equating the sum of the left hand sides of 
Eqs. (ll22j) to the sum of the right hand sides) yields 

9(x 3 \x 2 , xi) + 9(x 2 \xi) + 9(xi) = 9(x 3 , x 2 , x x ) . (123) 
The previous equation, and the P chain rule, together imply: 



e i8(x ^ X2 ' Xl \P^(x 3j x 2 , = ^e ie(x ^ X2 ' Xl \P^(x 3 \ x 2 , e i6 ^ Xl) P^(x 2 \x x ) e ieixi) P*{x x ) . 

7 ' j ' 4 ' 4 ' 2 v I ' 1 v ; ' 

(124) 

We have indicated under each quantity the number of degrees of freedom it holds, 
assuming x x ,x 2) x 3 G Bool. The previous equation is equivalent to Eq. (11211) . which 
we set out to prove. For arbitrary N, Eq. (11231) generalizes to 

TV 

()(r.) y^Olr,.,',., .). (125) 

3=1 

The previous equation, and Eq. (ll20p (the P chain rule), together imply the A chain 
rule: 

TV 

•H''.) i[ A{ - r i- r *. - ] - ( 126 ) 

3=1 

Note that the conditional amplitudes A(xj\xz 1 _i) used above have con- 
strained phases (CP), meaning that their phases are subject to the constraint that 
A(Xj\xz lj _ 1 ) be real for all xz lj _ 1 - Let M be the matrix with entries A(xj\xz 1>j _ 1 ), 
with the rows of M labelled by the states of and the columns labelled by the states 
of x Zl CP means that M must have one row (the one with Xj = x°) consisting 
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entirely of real numbers. On the other hand, Quantum Fog allows conditional ampli- 
tudes A(xj\xz 1 with free phases (FP), meaning that the phases of A(xj\xz 1 _ t ) 
are arbitrary. Clearly, it is often convenient, not just in Quantum Fog, to allow 
FP amplitudes. Luckily, one can always replace an FP amplitude A(xj\xz lj _ 1 ) by a 
product of CP amplitudes. This is how. To simplify our notation, let Xj — > a and 
x z 1:i -i b. Replace an FP amplitude A(b\a) by a product of three CP amplitudes 
A(b\a"),A(a"\a') and A(a'\a): 

A(b\a) = A{b\a")A{a"\a')A{a'\a) , (127) 

a', a" 

where a', a" G Sta. A(b\a) can be interpreted as the transition matrix of node b in a 
subgraph 

(b) - (a) . (128) 
This subgraph is being replaced by a Markov-chain graph 

(b) <- (a") <- (a') <- (a) . (129) 

Define the following matrices: 



[A(b\a)] = F , [A{b\a")) = C 1 , \A(a"\a')] = C 2 , [A(a'\a)] = C 3 . (130) 
Eq. fU27p . expressed in matrix form, is 

F = C 1 C 2 C 3 . (131) 
Suppose the first row of F is [xie 1 ^ 1 , x^e 1 ^ 2 , . . . , x^e 19 ^], where Xj, G R. Let 

d = Fdiag(e-^, e"^ 2 , , e"^ 3 , . . . , e"^) 

C 2 = diag(l,e^ 2 ,e^ 3 ,...,e^) . (132) 

C 3 = diag(e^,l,l,...,l) 

The matrices Ci, C2, C3 all have at least one row that consisting entirely of reals, so 
these matrices specify CP amplitudes. (If global phases are allowed, only 2 C's are 
necessary) . 

We end this section by defining graphic factorization. In classical physics, we 
say P G pd(St x ) factors according to G G DAG(xJ iff 

N 

P(x.) = n^Ka -)). (133) 
i=i 

In quantum physics, for a meta density matrix [x G dm(7i x ) of the form \i = 
proj(J2 x A(x.)\x.)), we say A factors according to G G DAG{x) iff 
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N 

A( X -) = Y[ A ( X jKa(j)) 



(134) 



By virtue of the P (ditto, A) chain rule, any probability distribution (ditto, 
probability amplitude) of x, factors according to an iV-node fully-connected DAG. If 
the probability distribution (ditto, probability amplitude) has higher symmetry, then 
it may also factor according to another iV-node graph that possess fewer arrows than 
the fully-connected one. 

7.2 Graphic I-sets 

In Section we defined some probabilistic I-sets. The elements of a probabilistic 
I-set are defined in terms of a probability distribution (or a meta density matrix). 
In this section, we define some graphic I-sets for a DAG. The elements of a graphic 
I-set are defined with respect to a graph. 

For G G DAG(xJ, we define (loc=local, glo=global) 

1ioc{G) —{1:1= (Xj 1 x^ de(j) \x pa{j) ),j G Z hN } , (135) 

and 



%gio(G) = V ■ I = (%j -L x K \x E )\ J,K,E C Z ljN are disjoint; r sepG (I)} . (136) 

The function r sepG : T{x) — > Bool will be defined later on. 
For example, if G is the graph of Fig{T](a), then 

1ioc{G) = {(x 3 1 Xalsx), (x 4 J_ x^x^Xz)} . (137) 

7.3 Graphic Factorization iff an I-set is satisfied 

In this section, we show that a probability distribution (ditto, probability ampli- 
tude) factors according to a DAG iff the probability distribution (ditto, probability 
amplitude) satisfies a graphic I-set. 

As motivation for the main theorem of this section, let G be the DAG of 
FigH^a). Note that J Zoc (G) C J(P) iff r p (I) for all I G li oc (G). Therefore, for the 
graph G of FigOJa), Ii oc (G) C 1(P) is equivalent to 

Eq. (I138j) is itself equivalent to 

r p(x 3 |z2,zi) = p(x 3 \x l ) , 13g , 

\ P(x4\x 3 ,X 2 ,Xi) = P(x^\x 3 ,X 2 ) 
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Define Pchain and P gra ph as the following two probability distributions of x\ 

Pchain{x.) = P{X 4 \X 3 , X 2 , X 1 )P(x Z \x 2 , X 1 )P{x 2 \x 1 )P(x 1 ) , (140) 

and 

Pgraph(X-) = P(%4 \x 3 , X 2 ) P(x 3 \ X X ) P(x 2 \ X X ) P{x X ) . (141) 

Pchain comes from the P chain rule and P gra ph from the definition of factorization 
according to the graph of FigJUa). From Eqs. (ll39[ ). (11401) and (11411) . it is clear that: 
If t p (I) for all I G 2i oc (G), then P c hain = Pgraph- The converse statement is also true. 
This is why. P cha in = Pgraph implies 

P(x 4 |x3, x 2 , xi)P(x 3 \x 2 , Xi) = P(x 4 |x 3 , x 2 )P(x 3 \xi) . (142) 

Summing both sides over £ 4 gives P(xz\x 2 ,Xi) = P(x 3 \xi). Combining this result 
with Eq. (11421) then gives P(x 4 |x 3 , x 2 , x\) = P(x 4 |x 3 , x 2 ). Thus, Eqs. (ll39p are obeyed. 
We have just proven, albeit only for the graph of Fig{T](a), the following theorem: 

Theorem 8 Suppose G G DAG(x) and P G pd(St x ). 
P factors according to G iff Ii oc (G) C 1{P)- 

proof: The proof is a special case of the proof of the next theorem. 
QED 

Theorem 9 Suppose G G DAG(x) and /i G dm(1i, x ) is a meta density matrix of 

the form = proj(^ 2 , A(x.)\x.)). 

A factors according to G iff X[ oc [G) C 1{A). 

proof: Without loss of generality, we can assume that the nodes are labelled so that 
pa(j) C for all j. This means that we can always add arrows to G until we 

generate a fully connected graph G such that pa(j) = We will call G a proper 

fully-connected extension of G. What we need to prove can now be rephrased as: 

■"-chain pgraph iff r (Xj J_ x^ de ^\x pa ^) V j , (143) 

where 

N N 

A chain (x.) = Y[ A ( x j\ x Zij-i) i A graph (x.) = Y\_A(xj\x P aU)) ■ ( 144 ) 

3=1 3=1 

(<=) Define Z' Xi _ x = Z\j-\ -pa(j). Since T A (Xj 1 x^ de{j) \x pa{j) ) and Z' X j _ x C 
Zij-i C -ide(j'), it follows from reduction rule 2 — > 1 that r A (x_j _L x z i \%pa(j))- 
Thus, 

A{x j \x Zlj _ 1 ) = A(x j \xz' liS _ 1 ,x pa(j) ) (145a) 
= A(xj\x ptt ( J) ) . (145b) 
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(=>)A chain = A graph implies that 

N N 

n p ( x j\ x zi*-i) = n p ( x o\ x P a(j)) ■ (146) 
j=i j=i 

• Sum both sides of Eg. (ll46p over £z 2Jv . Get -P(xi) = -P(xi). 

• Divide both sides of Eg. (11461) by P(xi), and then sum both sides over xz 3N - 
Get P{x 2 \x 1 ) = P(x 2 \x pa{2) ). 

• Divide both sides of Eg. ( 11461) by P(x 2 , £i), and then sum both sides over xz iN - 
Get P(x 3 \x 2 ,xi) = P(x 3 \x pa ( 3 )). 

• Divide both sides of Eg. (I146|) by P(x 3 ,x 2 ,Xi), and then sum both sides over 
x ZsN . Get P(x 4 \x 3 ,x 2 ,x 1 ) = P(x 4 |x pa (4)). 

• And so on. 

Thus, by induction, 

p (xj\ x zij-i) = p ( x j\ x pa(j)) (147) 

for all j. 

A ch ain = A graph also implies that 

N N 

^9{x j \x ZlJ _ 1 ) = ^2d(x j \x pa{j) ) . (148) 
j=i 3=1 

Recall that 9(x°\x Zl . ^ = 0. 

• Set x Z2N x Z2N in Eq.flUHJ. Get 9{x x ) = 9{x x ). 

• Set x Za N — > x° Zz in Eg. (1148[) and subtract 9{x\) from both sides. Get 9(x 2 \xi) = 
0{x 2 \x pai2) ). 

• Set x Z4N — ► x°z 4N in Eq. (1148p and subtract 9(x 2 ,xi) from both sides. Get 

9(x 3 \x 2 ,x 1 ) = 9{x 3 \x pa{3) ). 

• Set xz 5N — > x°z 5N in Eg. (11481) and subtract 9(x 3 , x 2 , x±) from both sides. Get 

X 3 ,X 2 , X\ ) = 9(x 4 \x pa ( 4 )). 

• And so on. 
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Thus, by induction, 

9{x j \x Zlj _ 1 ) = 0{xj\x pa ^) (149) 
for all j. Combining Eq. fTWD and <^M> yields 

= A(xj\x pa(j) ) (150) 

for all j. Note that ->de(j) D Zij-i- From a proper fully-connected extension of G, 
it is clear that 

A ( x i\ x ^de(j)) =A(x j \x Zl j_ 1 ) (151) 
for all j. Combining the previous two equations yields 

A{Xj\x^(i e (j}) = A(xj\x pa (j}) (152) 

for all j. Define ->de'(j) = ->de(j) —pa(j). The previous equation can be written as 

A{xj\x^ del( j),x pa <j)) = A(xj\x pa (j)), which means that t a {x^ J_ x_^ de , {j) \x pa{j) ) . Thus, 

r (Xj J_ x_^ de{j) \x paU) ). 

QED 

7.4 Going Global 

In the last section, we showed that a probability distribution P (or a probability 
amplitude A) factors according to a DAG iff it satisfies a certain non-global, graphic 
I-set. Does a similar result hold if the non-global graphic I-set is replaced by a global 
graphic one? This section will be devoted to answering this question. 
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Figure 5: Some simple Bayesian nets and their truth value for r p (x _L y\j), which 
is the same as their truth value for r (x _L y\,). The independency (x _L y\ , ) is 
conditioned on the grounded nodes. T=true, F=false. 

To develop some intuition, we begin by considering Figj5l which shows some 
simple Bayesian net examples. 

Column 1 of Fig|S] shows four DAGs in which, respectively, node a is: 

1. a serial node of a path from x to y, 

2. ("common cause" graph) a divergence node of a path from x to y 

3. ("common effect", "explaining away" graph) a convergence (a.k.a. collider) 
node of a path from xtoy 

4. the descendant of a collider node of a path from x to y. 
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Column 2 of Fig J5] illustrates two special cases of the graphs in column 1: (1) 
no node is grounded, (2) only node a is grounded. Nodes decorated with a £ are 
summed over for rj = P and traced over for 77 = A. 

In FiglSJ the argument of r v is an independency whose third slot is filled with 
the grounded nodes. In FigJSl the grounded nodes are always either a or nothing. An 
independency with no grounded nodes is unconditional. In the classical physics case, 
column 3 of Figj5] gives the truth values (T=true, F=false) of r p (I), for the graphs 
in column 2. In the quantum physics case, column 3 gives the values of r A (I). 

Next, we show how we calculated the truth values of t a (I) in Figj5l Let 
A = Ph ie . 

Rows 1 and 2 (graphs in which a is a serial node) satisfy 

A(x,y,a) = A(y\a)A(a\x)A(x) = A(y\a) A(x\a) A(a) , (153) 
for all x,y,a. Eq. (jl53j) implies 

P(x, y) = J2 P(y\a)P(x\a)P(a) ^ P(x)P(y) , A(x, y) + A(x)A(y) . (154) 

a 

r p (x _L y) is false so r A (x _L y) is false too. Eq. (11531) implies 

A(x,y\a) = A(y\a)A(x\a) , (155) 

so r A (x _L y\a) is true. 

Rows 3 and 4 with divergence node a must have the same truth values as rows 
1 and 2 with serial node a. That's because the Bayesian nets 

(x) -> (a) - (y) (156) 

and 

(x) - (a) - (y) (157) 

are indistinguishable: they both represent the same full joint amplitude. Indeed, 
Ax(x, y, a) = A(y\a)A(a\x)A(x) for the first and A 2 (x,y,a) = A(y\a)A(x\a)A(a) for 
the second, and A\ — A%. □ 

Rows 5 and 6 (graphs in which a is a collider node) satisfy 

A(x, y, a) = A(a\x, y)A(y)A(x) , (158) 

for all x,y,a. Therefore, P(x,y) = P(x)P(y), 9(x,y) = 8(x)9(y). Hence, t a (x _L y) 
is true. t p (x _L y\a) is false so t a (x _L y\a) is false too. 

5 Sometimes, some of the arrows of a classical Bayesian can be reversed without changing the full 
joint probability distribution of the net. General rules have been given in the literature (see Ref. [1]) 
for deciding which arrows can be reversed with impunity. Similar rules apply for quantum Bayesian 
nets. 
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Rows 7 and 8 (graphs in which a is a descendant of a collider node) satisfy 

A(x, y, a, b) = A(a\b)A(b\x, y)A{x)A{y) , (159) 

for all x, y, a, b. Therefore, P(x, y) = P(x)P(y), 9(x, y) = 9(x)9(y). Hence, t a (x _L y) 
is true. t p (x _L y\a) is false so r A {x J_ y\a) is false too. 

Note that the calculations of the truth values of t p (I) in Figj5]are a special 
case of the just presented calculations of the truth values of r A (I). 

The moral of FigO, is that grounding a serial node or a divergence node 
interrupts information transmission between x and y. A non-vacuous message has 
variation in it, and a grounded node in its path prevents transmission of this variation. 
However, grounding a collider or a descendant of a collider has the opposite effect: it 
allows information transmission (this is called the "explaining away" phenomenon). 

So far, this section has presented merely anecdotal evidence. Next, we will 
state and prove some general theorems. 

Consider any G G DAG{x). Suppose J,K,E C are disjoint sets. Let 
). We will abbreviate "dependency separation" by "d-sep" or just 
"sep". We define the function r sepG : Z(xJ — > Bool by the statement: r sepG (I) is 
true iff all paths 7 in G from a node in x_j to a node in x K are blocked by x E . We say 
"7 is blocked by x E " iff there exists a node x_ t e 7 that satisfies one of the following: 

1. x t is a non-collider of 7 and % G E. 

2. is a collider of 7 and de(i) D E — 

Theorem 10 (Classical d- Separation Theorem) Suppose G G DAG(x) and P G 
pd(St x ). If P factors according to G then Z g i (G) C 1(P). 

proof: The proof of this theorem can be found in the literature [lj[2j. 
QED 

Theorem 11 (Quantum d-Separation Theorem) Suppose G G DAG{x) and \i G 
dm(7i x ) is a meta density matrix of the form /i = proj(^ a , A(x.)\x.)). If A factors 
according to G then I g i (G) C T{A). 

proof: The proof of this theorem is a simple generalization of the proof of Theorem 

EES 

QED 

One can also prove a weak converse of the d-Separation Theorem. The weak 
converse theorem [TJ shows that I g i (G) is in some sense the maximal set for which 
the d-Separation Theorem holds. For this reason, Ref.p] describes the d-Separation 
Theorem as a proof of soundness and its weak converse as a proof of completeness. 
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8 Markov Networks 



In this section, we show that any probability distribution can be represented by a fully 
connected UG. We also show that any quantum density matrix can be represented by 
a fully connected UG. In classical and quantum physics, omitting certain links from 
this fully connected graph indicates certain probabilistic independencies. 

8.1 Power-set Rule and 

Factorization According to a Graph 

In this section, we define a power-set rule and factorization according to an UG, both 
for classical and quantum physics. The P power-set rule is well known, but not by 
that name, which is ours. In some sense, the P (ditto, A) power-set rule is to Markov 
nets what the P (ditto, A) chain rule is to Bayesian nets. 

Theorem 12 (P Power-set Rule) Any P e pd(St x ) can be expressed as 

P( x .) = ft e X(xj) , (160) 

where X(x j) is defined by 

K x j) = E (-l) |J - J ''lnP(^,x^ )c ). (161) 

J':J'CJ 

(Note that if for some point x! , P[x!) = , then X(x'j) = —oo for some J. Instead 
of permitting such infinities, as we do, some authors restrict this theorem by adding 
a premise that P ^ 0.) 

proof: The proof is a special case of the proof of the next theorem. 
QED 

Theorem 13 (A Power-set Rule) Given a meta density matrix fj, G dm(Tl x ) of the 
form /i = Woj(J2 x A(x.)\x.)), A can be expressed as 

A(x.) = ]J e x{xj) , (162) 

J: JCZi,jv 

where X(xj) is defined by 

A(xj)= J2 (-l) |J " J ''ln^(xj',x^ )c ) . (163) 
J'-.J'cJ 

(Note that if for some point x! , A(x.') = , then Re(X(x'j)) = — oo for some J. 
Instead of permitting such infinities, as we do, some authors restrict this theorem by 
adding a premise that A^O.j 
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proof: Performing a Mobius Inversion (see Appendix [A} on Eq. (11631) . we get 



\nA(xj,x° (J)B )= A ^')- ( 164 ) 

J':J'CJ 

Replacing J by Z\j$ in the previous equation yields: 

lnA(x.) = Yl X( ,xj) ■ (165) 

J: J dZi^N 

QED 

For Af random variables, the P (ditto, A) chain rule contains iV factors whereas 
the P (ditto, A) power-set rule contains 2 N . Thus, a power-set rule is not as useful 
as a chain rule for practical purposes like numerical calculation. It is mainly used to 
prove other theorems. 

We end this section by defining graphic factorization. In classical physics, we 
say P G pd(St x ) factors according to G G UG(x) iff P can be expressed in the 
form of 

P(x.) = \ [ e x{xj) . (166) 

J £ super— clique s(G) 

In quantum physics, for a meta density matrix fi G dm(TC x ) of the form /x = 
P r °j(XL A(x.)\x.)), we say A factors according to G G UG(x) iff A can be ex- 
pressed in the form of 

A(x.) = j | e x{xj) . (167) 

J £ super— clique s(G) 

When G is fully connected, Eq. (11671) reduces to A(x.) = e x ^ x, \ which is al- 
ways possible. Thus, any probability distribution (ditto, probability amplitude) of 
x, factors according to an iV-node fully-connected UG. If the probability distribution 
(ditto, probability amplitude) has higher symmetry, then it may also factor according 
to another iV-node graph that possess fewer links than the fully-connected one. 

8.2 Graphic I-sets 

In Section I7T21 we defined some graphic I-sets for a DAG. In this section, we define 
some graphic I-sets for an UG. 

For G G UG(x), we define (loc=local, glo=global) 

1 P air{G) —{1:1— {x h 1 x j2 \x ZiN _ {juh} ),j l G" ne(j 2 );jij 2 e Z X>N } , (168) 

Zioc(G) = {/:/ = (x. JL x ZhN _ mU) \x ne{j) ),j G Z ljN } , (169) 
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and 



IgioG) —{1:1— (x_j _L x K \x E ); J, K, E C Z 1>N are disjoint; r sepG (/)} . (170) 
The function r sepG : X{x ) — > Soo/ will be defined later on. 

8.3 Graphic Factorization iff an I-set is satisfied 

In this section, we show that a probability distribution (ditto, probability ampli- 
tude) factors according to an UG iff the probability distribution (ditto, probability 
amplitude) satisfies a graphic I-set. 

Theorem 14 Suppose G G UG(x) and P e pd(St x ). 
P factors according to G ^ X pair (G) C T(P). 

proof: The proof is a special case of the proof of the next theorem. 
QED 

Theorem 15 Suppose G E UG(x) and fi 6 dm(H x ) is a raeta density matrix of the 

form /j = W0](J2 X . A(x.)\x.)). 

A factors according to G ^ X paiT (G) C 1(A). 

proof: If the number of nodes N is one then the theorem is satisfied trivially, so 
assume N > 2. Recall A factors according to G iff 

A(x.) = l\ e x(xj) . (171) 

J 6 super— clique s(G) 

Note that T pair (G) C 1(A) iff r A (x h _L x j2 \x Zi N -{ jl; j 2 }) for all j u j 2 G Z 1>N such that 
3 1 £ ne(j 2 ). 

(<^=)(This direction would require a premise A ^ if we weren't permitting 
infinite |A(xj)|). Consider any J C Zi,jv- Suppose Ji,j2 are any two elements of J 
(there may or may not be a link between x_, x and x_j 2 at this point). Let J~ denote 
J — {ji, J2}- Note that for any function f.:2 J ^C 

Yl f J ' = {fj- 1 + fj-'u{h,j2} + fj-'u{n} + fj-'u{j 2 }) ■ ( 172 ) 

J'-J'CJ J-':J-'CJ- 

Now define £ by 

ti=(x J „,,x ZiN _ {n . 2} _ J _,) . (173) 
Using Eq. fll72p . Eq. (11631) can be re- written as 
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J-':J-'CJ- 



A( y Xj 1 , Xj 2 , ^) ^(x^ , Xj 2 , £) 



(174) 



If Xj t and x j2 are not in the same super-clique, then there is no link between them. 
h £ ne (h) so T A (x h JL x j2 \x ZlN _ {juj2} ), so 

^,410 = ^10^(410, (175) 

for all x'j x G N^. and Xj 2 G N^_. . When Eq.( jl75p is true, the right hand side of 
Eq. fll74p vanishes. In conclusion, if 31,32 G J but and x, are not in the same 
super-clique, then A(xj) = 0. In general, A(x.) = YIj jcz 1 N e * • (This would 
require A ^ if infinite |A(xj)| were not permitted.) But we have shown that X(xj) 
vanishes for any J C which is not a super-clique of G. Thus, Eq. (11711) follows. 

(=>) Let ji,j 2 G Zi,jv such that ji G" ne(j 2 ). Define i? = Z 1)N - {31 ,32}- x h 
and x - 2 must belong to different super-cliques of G. This fact and Eq. (I17ip together 
imply that there exist sets Ri,R 2 ( n ot necessarily disjoint) such that R = R± U R 2 
and such that A(x.) can be expressed as a product of two terms as follows: 

A(x jl ,x h ,x R ) = ai(x jl ,x Rl )a 2 (x j2 ,x R2 ) . (176) 

As usual, let A{x.) = P^(x.)e e ^ x \ The previous equation implies that P (x.) can be 
expressed as a product of two terms as follows: 

P(x jl ,x h ,x R ) = q 1 (x jl ,x Rl )q 2 (x h ,x R2 ) . (177) 
Summing both sides of Eq. (I177j) over Xj 2 , over x^ and over both, gives, respectively, 

P{x h ,x R ) = qi(x h ,x Rl )q 2 (x R2 ) , (178) 



P(x j2 ,x R ) = q 1 (x Rl )q 2 (x j2 ,x R2 ) , (179) 

and 

P(xr) = qi(x Rl )q 2 (x R2 ) . (180) 
From Eqs. ffT77D to (fl80l) . it is clear that 

P(x jl ,x j2 \x R ) = P(x h \x R )P(x j2 \x R ) . (181) 
Eq. fll76p implies that 9{x.) can be expressed as a sum of two terms as follows: 

0(x jl ,x h ,x R ) = u)i{x h ,x Rl ) + u 2 (x j2 ,x R2 ) . (182) 
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Eq. fll82p immediately yields 



^V^jl > "^32 \ ^R) 



> •Ejs'^R) ' "^32 ' 

- [u)i [x° h ,x Rl ) + u 2 (x° j2 ,Xr 2 )\ 



(183a) 
(183b) 



6{x h \x R ) = 6{x h ,x° h \x R ) 

= LJi{x h ,X Rl ) - U)i{x° v X Rl ) 



(184a) 
184b) 



and 



9(x j2 \x R ) = 9(x° v x j2 \x R ) (185a) 

= u 2 (x j2 ,x R2 ) - u 2 (x° 2 ,x R2 ) . (185b) 

Thus, 

d(x h ,x j2 \x R ) = d(x h \x R ) + 9(x h \x R ) . (186) 
Combining Eqs. (11811 ) and (11861) . we get 

A(x jl ,x j2 \x R ) = A(x h \x R )A(x j2 \x R ) . (187) 



QED 



8.4 Going Global 

In the last section, we showed that a probability distribution P (or a probability 
amplitude A) factors according to an UG iff it satisfies a certain non-global, graphic 
I-set. Does a similar result hold if the non-global graphic I-set is replaced by a global 
graphic one? This section will be devoted to answering this question. 

Consider any G G UG(xJ. Suppose J,K,E C Z liN are disjoint sets. Let 
). We define the function r sepG : T(x) — > Bool by the statement: 
T se P g(-0 ^ S t rue iff & U paths 7 in G from a node in x_j to a node in a^- are blocked by 
x E . We say "7 is blocked by x E " iff there exists a node x t G 7 that satisfies i £ E. 

Theorem 16 Suppose G G UG(xJ and P G pd(St x ). For £ G {glo,loc,pair}, let 
denote the statement T^(G) C 1(P). 

^ glo = ^ ^*Zoc ^pair ■ 

If P 7^ 0; ^gZo ^Zoc ^pair ■ 
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proof: The proof is a special case of the proof of the next theorem. 
QED 

Theorem 17 Suppose G £ UG(xJ and \i £ dm(Tl x ) is a meta density matrix of the 
form \i = proj(^ x A(x.)\x.)). For £ £ {glo,loc,pair}, let denote the statement 
X,{G)cX{A). 

<&glo $loc %aiv 
IfA^O, <$> g i & $ /oc <£> <5> pair . 

proof: 

proof that <S> g i =>- $ ioc : Let / = (xj JL x Zi N - Mj) \x ne{j) ) £ X loc (G). T sepG (I) 
so r A (J). 

proof that $ ioc =>- $ pair : Suppose j u j 2 £ Zi )JV and ji ne(j 2 ). r A (x h _L 
^Zi iJV -ne(ji)l^ne(ji)) and J2 e Z ljJV - ne(ji) so, using the reduction rule 2 -> 1', we get 

rA (^n -L ^ 2 l^i,Ar-0ij2})- 

proof that (y4 7^ 0, $ pa ir) =>■ $9*0 : Suppose J,K,E C Z^jv are disjoint sets. 
Let I = (xj _L x K \x E ). Note that $ pa ir is equivalent to: t a {x^ _L Xj 2 \x Zl N -{j u j 2 }) for 
ji £" ne(j 2 ). What we want to prove is $ sep , which is equivalent to: r sepG (I) =>- t a (I). 

If we can prove the theorem when \J\ + \K\ + \E\ = N, then the other cases 
will follow. This is why. Suppose \ J\ + \K\ + \E\ < N and r £ Z ljN - J - K - E. 
Assume r sepG (I). Since T sepG (i) is true, either T sepG (x_j,x r JL or r sepG (xj J_ 

) must be true. For if both were false, there would be a path from a node 
in x_j to a node in x K that was not blocked by x E , contradicting t G (I). In general, 
all the nodes that are not in x_j,x K ,x E , can be put in either the J side (if they 
are d-separated from the K side) or the K side (if they are d-separated from the J 
side). Thus, there exist disjoint sets J/ at and Kf at such that J/ at D J, if/ ai D K, 
\J fat \ + I if /at I + \E\ = N, and such that I fat = (x Jfat -L z^Jze) satisfies T sepG (I fat ). 
If we can prove that T sepG (If at ) =>- r A (If at ), then, by virtue of the reduction rule 
2 -> 1, t a {I) will follow. 

It now remains for us to prove the theorem for the fat case when |J| + \K\ + 
\E\ = N. The proof is by induction in \ J\ + \K\. 

When I J| + \K\ = 2, J = {j}, if = {A;}, / = J_ x fe |x £ ). Assume r sepG (J). 
It follows that j £" ne(k). Hence, I £ X pair (G). Hence, r A (I). 

Now assume T sepG (I) =>- r A (i) when \ J\ + |if| £ Z 2>a and try to prove it for 
I J I + I if I = a + 1 > 2. Either | J| or |if| is greater than two, so we may assume, 
without loss of generality, that \K\ > 2. Let k £ K and K' = K — {k}. Let 
h = (xj -L 52.x'l^£;u{fe})' and ^2 = {.Xj J_ a^l^Eu^)- Assume T sepG (I). It follows that 
T sepG (Ii) and t G (I 2 ) . Furthermore, |J| + \K'\ < a + 1 and | J| + 1 < a + 1 so, 
by the inductive hypothesis, r A (ii) and r A (I 2 ). By virtue of the combination rule 
1', V — > 2 (here we use ,4 7^ 0), r A (/i) and r A (/ 2 ) together imply r A (7). 
QED 
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Theorem 18 ( Classical d- Separation Theorem) Suppose G G UG(x) and P G pd(St x ). 
If P 7^ and P factors according to G, then X g i (G) C X(P). 

proof: Follows from Theorems [T41 and [TBI 
QED 

Theorem 19 ( Quantum d-Separation Theorem) Suppose G G UG(x) and \i G dm(H x ) 
is a meta density matrix of the form \i = proj(^ :r A(x.)\x.)). If A ^ and A factors 
according to G, then I g i (G) C T(A). 

proof: Follows from Theorems fT5l and fT71 
QED 

9 d-Separation and Quantum Entanglement 

In this section, we show that the quantum d-separation rules for Bayesian and Markov 
graphs can be used to detect pairs (x_j,x K ) in a graph that are unentangled. 
For G G DAG(xJ, define 

V gio(G) = {{xj, x K ) : J, K C Z ljN are disjoint ; r sepG {xj ± x k \x Zi , n -j-k)} ■ ( 188 ) 

For G G UG(x.), define V g i (G) in the same way. The function r sepG has been defined 
previously. Its definition is different for DAGs than for UGs. 

For a meta-density matrix yu = proj(^ a; A(x.)\x.)), define 

V(A) = {{xj,x K ) : J, K C Z 1)N are disjoint ; E^ MI (xj : x K ) = 0} . (189) 

Theorem 20 Suppose G G DAG{x) (ditto, G G UG(xJ) and \i G dm(H x ) is a 
meta density matrix of the form fj, = proj(J2x A(x,)\x.)). 
If A factors according to G, then V g i (G) C T>{A). 

proof: Assume A factors according to G G DAG(x,) (ditto, G G UG(xJ). Let 
JjA'C Zi : n be disjoint sets. Let E = Z\n — J — K. Let I ). Assume 

T sepG (I). The quantum d-separation theorem, namely Theorem [TT1 (ditto. Theorem 
[19"]) . tells us that if T sepG (I), then r (/). But we know from Theorem [5] that, because 
I is all-encompassing, r A (I) = r CMI (I). It follows that <Sdi ag (//) j : = 0. 

This and the definition of CMI entanglement imply that E^ MI (x_j : x K ) = 0. 
QED 

Suppose J C Zi tN , and we are given a density matrix p G dm(Hx T ) with a 
generalized purification G dm(H x ). Suppose Ji, J2 C J are disjoint sets, and we 
want to decide whether £ , ^ M/ (xj 1 : x j2 ) vanishes. Note that to apply Theorem I2U| 
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we should first replace /i by a traced purification jj, of p. The reason is that we are 
interested in Ep MI {x Jl : x_j 2 )- This quantity is not necessarily equal to E^ MI (x Jl : 
x_j 2 ), but it is always equal to E? MI (x Jl : x j2 ). 

In a nutshell, Theorem I2TH tells us that, ii J, K G are disjoint sets, and 
T se V G^3ij -L 2Lk\zLZi jv-J-x)? then E^ MI (xj : x x ) = 0. And now, some examples. Let 
=? mean that we can't conclude anything about the value of k. The Bayesian nets 

(x) - (a) -> (y) (190) 

and 

GsO - GO - (v) (i9i) 

both have E*? MI (x : y) — because a can be grounded and S^x : y\a) — 0. On the 
other hand, the Bayesian net 

eS 

(x) <-(&)-> (y) (192) 

is equivalent to /i = (x) (y), for which S^Qr : y) =?, so E^ MI (x : y) =?. The 
Bayesian net 

(x) -> (a) <- (y) (193) 

also has E^ MI (x : y) =?, because grounding a allows transmission of information 
between x and y. 

A parting observation: Suppose J,Ki,K 2 C Z^jv are disjoint sets. Let D\ = 
(xj,x K ), D 2 = (xj,x K2 ), and D = (xj,x K uK2 ). It's not hard to convince oneself 
that [D E V g i (G)] [Di,D 2 G V g i (G)]. By the synergism of entanglement, [D e 
£>(^4)] =>- [Di, D 2 G V(A)], but the opposite implication does not appear to be true. 
If we define a perfect graph as one for which V g i {G) = T>(A), then it appears that 
no all graphs are perfect. 
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A Appendix: Mobius Inversion 

In this appendix, we prove the Mobius Inversion Theorem. 

Some preliminary observations will facilitate our proof. 
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Figure 6: Table with rows and columns labelled by all the subsets of the set J = 
{a, b, c}. We want to sum over the shaded entries of this table. 



For any finite set J, consider a table of arbitrary complex numbers, where the 
rows and columns of the table are both labelled by the elements of 2 J . Suppose we 
want to sum over the elements of the table that are below its main diagonal. Figj6] 
illustrates the table for J = {a,b,c}. The shaded entries of Figj6]are the entries we 
want to sum over. Two simple methods for carrying out such a sum are:(l) sum first 
over rows and then over columns, (2) sum first over columns and then over rows. Of 
course, whether we use method (1) or (2), the final value of the sum will not change. 
This simple observation, that the final value of the sum does not depend on the order 
of summation, can be stated more formally as 

E E = E E ■ im 

J':J'CJ J":J"CJ' J":J"CJ J':J"CJ'CJ 

By E j':j'c.J we mean the sum of all subsets J' of J, including the empty set and 
J. Note that we use J', J" (i.e., J with one or more primes) to denote subsets of J. 
Another simple observation is that 

E (-1) |D '' = ^(A0). (195) 

D'-.D'CD 

For example, suppose D = {a,b,c}. FigfT] lists all the subsets D' of D. It associates 
each distinct D' with a different node of a lattice. (Subsets with the same number of 
elements are in the same horizontal level. Subsets in lower horizontal levels have more 
elements. Links connect subsets that differ only by one element.) If we sum (— 1) D 
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Figure 7: All subsets of {a, b, c}, arranged on a lattice. 



over all the nodes of the lattice of FigJTJ we get zero, since the number of even-order 
subsets equals the number of odd-order subsets. This is true for any set D except for 
the empty set, which has only a single even-order subset, itself. Eq. (11951) yields 



E = E (- 1 ) |AJ ' 1 

J':J"CJ'CJ AJ':0CAJ'CAJ=J-J" 

= 8(J,J"). 

If J" C J' C J, then | J - J'\ + \f - J"\ + | J" | = \J\ so 



(196a) 
(196b) 



E (- 1 ) |J " J ' 1 = E 

J':J"CJ'CJ J':J"CJ'CJ 

= 5(J,J"). 

Theorem 21 For any set J, and any functions f,g:2 J ^C, 

g(j) = E (-i) |j " j V(jo 

J':J'CJ 

if and only if 

/(J) = E 9(J') ■ 



-\J'-J"\ 



(197a) 
(197b) 

(198a) 
(198b) 



J'-.J'CJ 



proof: 
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E 9(J') = E E (-1) |J '- J 'V(J") (199a) 

J':J'CJ J':J'CJ J":J"CJ' 

E E (-i) |J '- J "'/(^) (199b) 

J":J"CJ J':J"CJ'CJ 

= f(J) (199c) 

(<=) 

E (-i) |j - j V(jo = E (- 1 ) |J_J/| E ( 200a ) 

J':J'CJ J':J'CJ J":J"CJ' 

E E (-l)' J - J \(^) (200b) 

J":J"dJ J':J"CJ'CJ 

= g(J) (200c) 



QED 
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